Frequent Patterns that Compress
نویسندگان
چکیده
One of the major problems in frequent pattern mining is the explosion of the number of results, making it difficult to identify the interesting frequent patterns. In a recent paper [14] we have shown that an MDL-based approach gives a dramatic reduction of the number of frequent item sets to consider. Here we show that MDL gives similarly good reductions for frequent patterns on other types of data, viz., on sequences and trees. Reductions of two to three orders of magnitude are easily attained on data sets from the webmining field.
منابع مشابه
A Compact FP-Tree for Fast Frequent Pattern Retrieval
Frequent patterns are useful in many data mining problems including query suggestion. Frequent patterns can be mined through frequent pattern tree (FPtree) data structure which is used to store the compact (or compressed) representation of a transaction database (Han, et al, 2000). In this paper, we propose an algorithm to compress frequent pattern set into a smaller one, and store the set in a...
متن کاملEfficient Associating Mining Approaches for Compressing Incrementally Updatable Native XML Databases
XML-based applications widely apply to data exchange in EC and digital archives. However, the study of compressing Native XML databases has been surprisingly neglected, especially for the huge amount of data and the rapidly updatable database. These two factors give rise to our interest, and motivate us to develop an approach to efficiently compress native XML databases and dynamically maintain...
متن کاملOn compressing frequent patterns q
A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure d (called d-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. ...
متن کاملPattern-growth Methods for Frequent Pattern Mining
Mining frequent patterns from large databases plays an essential role in many data mining tasks and has broad applications. Most of the previously proposed methods adopt apriorilike candidate-generation-and-test approaches. However, those methods may encounter serious challenges when mining datasets with prolific patterns and/or long patterns. In this work, we develop a class of novel and effic...
متن کاملOn compressing frequent patterns
A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure δ (called δ-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. ...
متن کامل